The Clusterrabstraction Model: Unsupervised Learning of Topic Hierarchies from Text Data
نویسنده
چکیده
This paper presents a novel statistical latent class model for text mining and interactive information access. The described learning architecture, called Cluster{Abstraction Model (CAM), is purely data driven and utilizes context-speci c word occurrence statistics. In an intertwined fashion, the CAM extracts hierarchical relations between groups of documents as well as an abstractive organization of keywords. An annealed version of the Expectation{Maximization (EM) algorithm for maximum likelihood estimation of the model parameters is derived. The bene ts of the CAM for interactive retrieval and automated cluster summarization are investigated experimentally.
منابع مشابه
The Cluster-Abstraction Model: Unsupervised Learning of Topic Hierarchies from Text Data
ion levels of words document partitioning abstraction levels (a) (b)
متن کاملMining bilingual topic hierarchies from unaligned text
Recent years have seen an exponential growth in the amount of multilingual text available on the web. This situation raises the need for novel applications for organizing and accessing multilingual content. Common examples of such applications include Multilingual Topic Tracking, Cross-Language Information retrieval systems etc. Most of these applications rely on the availability of multilingua...
متن کاملLearning Concept Hierarchies through Probabilistic Topic Modeling
With the advent of semantic web, various tools and techniques have been introduced for presenting and organizing knowledge. Concept hierarchies are one such technique which gained significant attention due to its usefulness in creating domain ontologies that are considered as an integral part of semantic web. Automated concept hierarchy learning algorithms focus on extracting relevant concepts ...
متن کاملIncremental Construction of Topic Hierarchies using Hierarchical Term Clustering
Topic hierarchies are very useful for managing, searching and browsing large repositories of text documents. The hierarchical clustering methods are used to support the construction of topic hierarchies in a unsupervised way. However, the traditional methods are ineffective in scenarios with growing text collections. In this paper, an incremental method for the construction of topic hierarchies...
متن کاملA New Document Embedding Method for News Classification
Abstract- Text classification is one of the main tasks of natural language processing (NLP). In this task, documents are classified into pre-defined categories. There is lots of news spreading on the web. A text classifier can categorize news automatically and this facilitates and accelerates access to the news. The first step in text classification is to represent documents in a suitable way t...
متن کامل